Measuring Amok Term Paper for CS 224U: Natural Language Understanding

نویسندگان

  • Richard Futrell
  • Samuel Bowman
چکیده

We propose and compare a number of metrics to capture the degree to which words are restricted in the contexts in which they can occur. We re-frame the problem of contextual restrictedness, and introduce the use of vector space models based on syntactic dependencies. We show that our most successful metric, residualized entropy, is quite successful in selecting highly collocationally restricted words, and is predictive of animacy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Measuring Scalability In Natural Language Understanding Tasks

In this paper we present a discussion of existing metrics for evaluation the performance of individual natural language understanding systems and components as well as the commonly employed metrics for measuring the specific task difficulties. We extend and generalize the common majority class baseline metric and introduce an general entropy-based metric for measuring the task difficulty of arb...

متن کامل

A Tool for Measuring the Reality of Technology Trends of Interest

In this paper, we present a prototype application – the Technology Trend Tracker – to measure the reality of technology trends of interest using information on the Web to inform decisions such as when to develop training, when to invest in expertise, and more. This prototype performs this task by integrating several artificial intelligence technologies in an innovative way. These technologies i...

متن کامل

An information theoretic approach for using word cluster information in natural language call routing

In this paper, an information theoretic approach for using word clusters in natural language call routing (NLCR) is proposed. This approach utilizes an automatic word class clustering algorithm to generate word classes from the word based training corpus. In our approach, the information gain (IG) based term selection is used to combine both word term and word class information in NLCR. A joint...

متن کامل

Computing Science Group CS-RR-10-04

We present a method for automatically creating large-scale semantic networks from natural language text, based on deep semantic analysis. We provide a robust and scalable implementation, and sketch various ways in which the representation may be deployed for conceptual knowledge acquisition. A translation to RDF establishes interoperability with a wide range of standardised tools, and bridges t...

متن کامل

Memo CS – 03 – 09

This paper concerns infrastructural work in the fields of Language Engineering, Natural Language Processing and Computational Linguistics. We begin by defining the area of software support for research and development of components in these areas as Software Architecture for Language Engineering (SALE). The rest of the paper reviews contributions to this field, covering a wide range of work ove...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012